Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology

نویسنده

  • Jarmo Huuskonen
چکیده

An accurate and generally applicable method for estimating aqueous solubilities for a diverse set of 1297 organic compounds based on multilinear regression and artificial neural network modeling was developed. Molecular connectivity, shape, and atom-type electrotopological state (E-state) indices were used as structural parameters. The data set was divided into a training set of 884 compounds and a randomly chosen test set of 413 compounds. The structural parameters in a 30-12-1 artificial neural network included 24 atom-type E-state indices and six other topological indices, and for the test set, a predictive r2 = 0.92 and s = 0.60 were achieved. With the same parameters the statistics in the multilinear regression were r2 = 0.88 and s = 0.71, respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Support Vector Machines for the Estimation of Aqueous Solubility

Support Vector Machines (SVMs) are used to estimate aqueous solubility of organic compounds. A SVM equipped with a Tanimoto similarity kernel estimates solubility with accuracy comparable to results from other reported methods where the same data sets have been studied. Complete cross-validation on a diverse data set resulted in a root-mean-squared error = 0.62 and R(2) = 0.88. The data input t...

متن کامل

Estimation of Aqueous Solubility of Chemical Compounds Using E-State Indices

The molecular weight and electrotopological E-state indices were used to estimate by Artificial Neural Networks aqueous solubility for a diverse set of 1291 organic compounds. The neural network with 33-4-1 neurons provided highly predictive results with r(2) = 0.91 and RMS = 0.62. The used parameters included several combinations of E-state indices with similar properties. The calculated resul...

متن کامل

Application of Random Forest and Multiple Linear Regression Techniques to QSPR Prediction of an Aqueous Solubility for Military Compounds.

The relationship between the aqueous solubility of more than two thousand eight hundred organic compounds and their structures was investigated using a QSPR approach based on Simplex Representation of Molecular Structure (SiRMS). The dataset consists of 2537 diverse organic compounds. Multiple Linear Regression (MLR) and Random Forest (RF) methods were used for statistical modeling at the 2D le...

متن کامل

Prediction of the pharmaceutical solubility in water and organic solvents via different soft computing models

Solubility data of solid in aqueous and different organic solvents are very important physicochemical properties considered in the design of the industrial processes and the theoretical studies. In this study, experimental solubility data of 666 pharmaceutical compounds in water and 712 pharmaceutical compounds in organic solvents were collected from different sources. Three different artificia...

متن کامل

A Fuzzy ARTMAP Based on Quantitative Structure-Property Relationships (QSPRs) for Predicting Aqueous Solubility of Organic Compounds

Quantitative structure-property relationships (QSPRs) for estimating aqueous solubility of organic compounds at 25 degrees C were developed based on a fuzzy ARTMAP and a back-propagation neural networks using a heterogeneous set of 515 organic compounds. A set of molecular descriptors, developed from PM3 semiempirical MO-theory and topological descriptors (first-, second-, third-, and fourth-or...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of chemical information and computer sciences

دوره 40 3  شماره 

صفحات  -

تاریخ انتشار 2000